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CONTROLLED  PROBABILITY  PROPORTIONAL  TO  SIZE 
SAMPLING  DESIGNS* 

By  A.  Hedayat  and  B.Y.  Lin 
Department  of  Mathematics 
University  of  Illinois,  Chicago 

ABSTRACT 

Any  sampling  design,  d,  of  size  n  without  replacement  based  on  a 

finite  population  U  of  N  units  or  strata  can  be  formally  presented  by  a 

pair  (S,,P.),  where  S,  called  the  support  of  d  is  any  set  of  subsets  of 
a  d  a 

size  n  each  based  on  the  elements  of  U  such  that  the  (set  theoretic)  union 

of  these  subsets,  called  samples,  is  U  and  P^  is  a  strictly  positive 

probability  distribution  on  S^.  A  sampling  design  is  said  to  be  a 

probability  proportional  to  size,  denoted  by  PPS(N,n), if  the  probability 

that  the  unit  i  is  being  selected  in  a  random  sample  is  proportional  to  a 

known  positive  quantity  associated  with  the  unit  i  *  1,2,...,N.  The 

literature  of  survey  sampling  offers  a  PPS(N,n)  with  Sd  consists  of 
N 

all  ^n)  possible  samples.  Here  we  give  an  easily  applicable  technique 
for  the  construction  of  PPS(N,n)  with  various  support  sizes  and  various 
probabilities  on  each  support.  Such  sampling  designs  are  needed  for  controlled 
samplings  when  some  samples  are  undesirable  to  be  chosen  or  we  need  to 
minimize  (or  maximize)  the  probabilities  of  the  selection  of  certain 
samples. 


A 


CONTROLLED  PROBABILITY  PROPORTIONAL  TO  SIZE 
SAMPLING  DESIGNS 


By  A.  Hedayat  and  B.Y.  Lin 

1*  Introduction  Let  U  be  a  finite  population  of  N  units  or  N  strata. 

A  sampling  design  without  replacement,  d,  based  on  U  is  a  pair  (S^.P^) 

where  S^,  called  the  support  of  d,  is  any  set  of  nonempty  subsets  of  U  and  P^ 

is  a  strictly  positive  probability  distribution  on  S  ,  with 

a 


a.i) 


U 


vsd 


Every  member  of  S^  is  called  a  sample  and  a  random  sample  (or  probability  sample) 

is  a  sample  selected  by  implementing  d.  In  general  samples  in  S,  may  have 

d 

different  sizes. 

In  order  to  implement  d  we  must  know  the  precise  structures  of  S,  and  P.. 

d  d 

However,  for  the  purpose  of  customary  statistical  analysis  of  the  data  collected 
via  a  random  sample  s^  all  we  need  to  know  are  the  following  quantities  called 
respectively  the  first  order  and  the  second  order  inclusion  probabilities: 

(1.2)  II  *  prob.  that  a  random  sample  will  contain  the  unit  i 

-  I  »,<•«> 

•A1 

(1.3)  11^  *  prob.  that  a  random  sample  will  contain  the  units  i  and  j*  i 


2 


Note  that  by  (1.1), II  >  0.  However,  II  can  be  zero  for  some  i  and  j.  Indeed 

di  dlj 

some  of  the  classical  sampling  designs,  such  as  systematic  samplings,  have  the 
undesirable  property  that  IId  “  0  for  some  i  and  j .  If  we  are  interested  in 
unbiased  estimation  of  variances  of  linear  estimators  we  should  avoid  such 
sampling  designs. 

In  the  context  of  our  discussion  the  literature  of  survey  sampling  are 
basically  of  two  types: 

1.  Those  which  do  not  specify  Sd  and  Pd  but  rather  give  procedures  for  drawing 
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random  samples.  Among  these  some  specify  H^'s  and  's  and  some  give  only 
ndi's  and  leave  the  burden  of  deriving  's  to  the  reader.  Most  papers  are 
of  this  latter  type. 


2.  Those  which  specify  S,  and  P,.  The  values  of  n  's  and  H,  's  are  either 
*  3  d  d  di  dij 

given  or  can  be  easily  computed  by  knowing  and  P^.  Unfortunately,  only  few 
papers  are  of  this  type. 

2.  PPS  Sampling  A  purpose  of  survey  sampling  is  to  study  a  characteristic 
of  interest  by  a  random  sample  chosen  via  a  sampling  design.  Let  us  denote 
this  characteristic  of  interest  by  y.  There  is  often  a  case  that  besides  the 
characteristic  y  there  exists  some  auxiliary  characteristic  x  which  is  related 
to  y  and  its  information  is  available  to  us.  So,  to  each  unit  i  in  U  there 
are  associated  two  measurements  X^Cknovm  to  us)  and  corresponding  to  the 
characteristics  x  and  y  respectively.  We  want  to  estimate  the  population 
total  Y  *  Y^+Y2+...+Yn  utilizing  the  information  provided  by  a  random  sample 
generated  by  a  sampling  design  d.  It  is  known  (see  the  list  of  papers  at  the 
end)  that  in  some  cases  we  can  improve  the  precision  of  our  estimator  of  Y 
if  we  properly  utilize  the  information  provided  by  X^ *s  in  the  formation  of 
the  sampling  design.  One  such  sampling  design,  popular  among  survey  statisticians, 
is  called  probability  proportional  to  size  (PPS)  sampling  design.  Through  our 
notation  this  is  defined  as: 

Definition  2.1  A  sampling  design,  d  -  (S^.P^) ,  based  on  U  is  called  a 
probability  proportional  to  size  design  of  size  n,  designated  by  PPS(N,n) ,  if 


(i)  each  sample  in  S,  consists  of  n  units,  and  (ii)  the  probability  H 

N 

is  proportional  to  q.  (hence  the  name),  where  q.  =  X  /£  • 

j-lJ 


di 


Since  in  any  sampling  design  based  on  samples  of  size  n 


(2.1) 


N 


i-1 


N 


di 


2  2Z  pd(sd}  “ n> 


i-l  3d»i 


therefore  in  PPS(N,n)  sampling  design 

(2.2)  H41  . 
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which  puts  the  requirement  nq^<i  for  the  existence  of  such  designs.  As  we 

shall  see  later  without  further  demands  on  d  such  sampling  designs  always  exist. 

Here  we  would  like  to  emphasize  two  points.  First,  q  does  not  have  to  be  of 
N  i 

the  form  X^/£x  in  general,  we  allow  q  ,  referred  to  as  the  size  of  the  unit  i, 

j-1 

to  be  any  positive  number  as  long  as  q^+q£+. .  .+q^  ”  1  and  nq^<l.  However,  for 
all  practical  purposes  we  can  assume  q^  to  be  a  rational  number.  Second,  we 
should  take  advantage  of  the  mild  requirements  of  PPS(N,n)  in  preparing  a 
sampling  design  which  meets  other  useful  requirements.  We  shall  explain  in 
details  this  latter  point  later  on. 

In  the  literature  of  survey  sampling,  PPS  sampling  designs  belong  to  a 
celebrated  family  of  sampling  designs  known  as  unequal  probability  sampling 
designs  without  replacement.  Our  purpose  here  is  not  to  review  the  literature 
on  this  family  but  rather  to  indicate  where  and  how  our  contributions  fit  in 
the  literature  dealing  primarily  with  PPS  sampling  designs.  The  interested 
reader  on  the  subject  of  unequal  probability  sampling  designs  should  consult 
the  selected  bibliography  and  their  corresponding  references  at  the  end  of  the 
paper . 

The  first  formal  attempt  to  construct  PPS(N,n)  sampling  design  was 


undertaken  by  Goodman  and  Kish(1950).  These  authors  do  not  specify  S^  or  P^ 
but  rather  give  a  procedure  for  drawing  a  random  sample  which  guarantees  the 


proportionality  of  II  ^  to  q^.  It  is  extremely  difficult  to  derive  a  general 
expression  for  II  's  of  the  procedure  of  Goodman  and  Kish  since  the 
mathematical  structure  of  their  procedure  is  quite  complicated.  Hartley  and 


Rao(1962)  used  asymptotic  theory  to  approximate  II  ^  's  of  the  procedure  of 
Goodman  and  Kish.  Though  it  is  difficult  to  specify  P^  of  the  corresponding 
design  of  Goodman  and  Kish  it  is  easy  to  see  that  consists  of  all  possible^n) 
samples  each  receiving  positive  probability.  Brewer(1963)  and  Durbin(1967) 


were  able  to  construct  PPS(N,2)  sampling  designs.  Their  designs  have  the 
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property  that  >  0  and  therefore  consists  of  all  possible^  )  samples. 

Sampford(1967)  inspired  by  the  results  of  Brewer(1963)  and  Durbin(1967)  was  able 
to  construct  PPS(N,n)  sampling  designs  for  all  N  and  n.  Again  the  support  of 
Sampford's  design  is  all(^)  possible  samples.  Sampford  designs  have  the 
desirable  properties  that  II  >  0  and  <  11^.  The  statistical  usefulness 
of  these  latter  properties  can  be  argued  as  follows.  To  estimate  the  population 
total  Y  we  can  use  the  Horvitz-Thompson  linear  unbiased  estimator 


(2.3) 


Yjjt  -  ^ 


ies 


di 


It  can  be  easily  verified  jjhat  2 

(2.4)  v*r(YHT)  -  l  -  irf) 

i-l  j-i+1  J 


and  can  be  unbiasedly  estimated  by  the  Yates-Grundy  estimator 

•  •  jj 

i*jes  dij  di  dj 


(2.5)  Var<YHT) 


provided  11^  >  0*  In  practice  it  is  desirable  that  this  estimator  be  always 

nonnegative.  The  property  that  H  <  II  II  guarantees  this. 

alj  ai  aj 

The  additional  properties  that  11^  >  0  force  the  size 

of  the  support  of  PPS(N,n)  to  be(^)  if  n  *  2.  However,  for  n>2  it  is  possible 

to  construct  PPS(N,n)  sampling  designs  whose  supports  have  less  than( ^)samples 

in  them.  This  allows  us  to  put  zero  probability  on  samples  which  we  consider  to  be 

undesirable  or  uneconomical  to  collect  data  from.  The  published  literature 

provides  no  such  opportunities.  We  are  also  able  to  construct  for  the  given  N,n 

and  sizes  q.  ,qn , . . .  ,qv,  various  S,  with  varieties  of  P,.  Again  here  we  are 
l  l  si  a  a 

able  to  control  the  members  of  and  their  corresponding  probabilities.  This 
means  that  either  we  can  exclude  undesirable  samples  from  or  put  very  little 
probabilities  on  them  for  the  selection  purposes.  If  we  are  interested  in  all 
(n)  samples  then  our  procedure  allows  to  construct  various  in  contrast  to  the 


procedure  of  Sampford  which  provides  no  choice  at  all.  Before  we  close  this 
section  we  give  two  examples  to  elucidate  the  above  points. 


Example  2.1  Let  U  be  a  stratum  of  size  N  ■  5  with  the  following  sizes: 

-  3/11,  q2  *  1/11,  q3  -  2/11,  q^  ■  3/11  and  q$  =  2/11.  Suppose  we  want  to 

select  a  sample  of  size  n  *  2  by  the  method  of  PPS  sampling.  Further,  we  desire 

that  n  >  0,  and  II.,  <  II. II  so  that  the  Yates-Grundy  estimator (2. 5)  does  not 
ij  ij  i  j 

take  negative  value.  Since  n  =  2  and  we  require  that  II  >  0  any  PPS  sampling 
should  have  all(  10  samples  of  size  2  in  the  support.  So  we  have  no  problem 

concerning  the  construction  of  the  support.  Therefore,  all  we  have  to  do  is  to 

1 

specify  probabilities  on  these  samples.  For  the  given  N,  n,  q^,q2,...,qN 
Sampford(1967)  gives  one  such  a  set  of  probabilities.  Since  in  this  case  n  =  2 
the  Sampford  probabilities  are  identical  to  those  of  Durbin(1967) .  Our  procedure 
(see  Section  3)  gives  several  such  possibilities.  Below  we  list  two  such  choices. 

Thus  in  this  case  we  could  not  control  the  support  due  to  the  restrictions 
imposed  on  the  design  but  we  could  control  the  probabilities  on  the  samples. 


Probability  on  the  Support 


(samples) 

Durbln/Sampford  design 

Examples  of 
_ 1 _ 

our  designs 
2 

12 

147/2497 

1/22 

1/22 

13 

324/2497 

3/22 

3/22 

14 

567/2497 

6/22 

5/22 

15 

324/2497 

2/22 

3/22 

23 

80/2497 

1/22 

1/22 

24 

147/249.7 

1/22 

1/22 

25 

80/2497 

1/22 

1/22 

34 

324/2497 

2/22 

3/22 

35 

180/2497 

2/22 

1/22 

45 

324/2497 

3/22 

3/22 

i 


-6- 


Example  2.2  Suppose  we  have  a  stratum  of  N  *  6  units  and  would  like 
to  select  3  units  based  on  a  PPS  sampling  with  the  following  sizes: 


q1  ■  2/17,  q2  =  3/17,  q^  »  4/17,  q^  -  1/17,  q5  =  2/17,  q&  =■  5/17.  Let  us  also 
require  that  H  >  0  and  II ^  <  H^II  .  As  we  said  the  only  available  procedure  in 
the  published  literature  is  that  of  Sampford(1967) .  In  the  following  table  we 
give  the  Sampford  design  as  well  as  a  design  generated  by  our  procedure. 


Sampford  design  Our  design 


S^: support 

P^:probability 

S^:support 

P probability 
a 

(samples) 

on  each  sample 

(samples) 

on  each  sample 

123 

44352/1529898 

123 

1/34 

124 

5445/1529898 

126 

3/34 

125 

12600/1529898 

136 

7/34 

126 

121275/1529898 

145 

1/34 

134 

10560^1529898 

234 

1/34 

135 

24192/1529898 

235 

1/34 

136 

221760/1529898 

236 

7/34 

145 

2880/1529898 

246 

2/34 

146 

29700/1529898 

256 

3/34 

156 

67200/1529898 

346 

1/34 

234 

19602/1529898 

356 

6/34 

235 

44352/1529898 

456 

1/34 

236 

381150/1529898 

245 

5445/1529898 

246 

54450/1529898 

256 

121275/1529898 

345 

10560/1529898 

346 

101640/1529898 

356 

221760/1529898 

456 

29700/1529898 
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Our  design  has  all  the  desirable  properties  which  the  Sampford's  design 
has.  In  addition,  our  design  puts  zero  probability  on  the  following  eight 
samples  124,  125,  134,  135,  146,  156,  245,  345.  Therefore,  our  sampling 
design  could  be  utilized  for  controlled  sampling  if  our  desire  is  not 
to  select  these  samples.  So,  in  this  case,  we  controlled  the  support  of  the 
sampling  design. 

The  technique  which  we  shall  present  in  Section  3  can  be  adjusted  to 

accommodate  certain  "reasonable"  demands  on  the  composition  of  S ,  or  structure 

d 

of  P^.  However,  we  do  not  like  to  leave  the  impression  that: 

(i)  we  can  freely  choose  the  samples  in  the  support;  or 

(ii) we  can  arbitrarily  manipulate  the  probabilities  on  the  samples. 

Clearly  these  cannot  be  done.  For  examples,  the  demand  that  H  >  0  for 

all  i  and  j ,  i  *  j  puts  an  obvious  restriction  on  the  composition  of  Sd  and  its 

cardinality,  i.e.,  the  samples  in  S^  must  form  a  cover  for  all  pairs  which  in 

turn  puts  a  lower  bound  on  the  number  of  samples  in  S,.  Or,  we  cannot  ask  for 

d 

a  sampling  design  which  puts  zero  probabilities  on  certain  undesirable  samples. 
What  our  technique  is  capable  of  doing  is  to  minimize  such  probabilities  though 
in  some  situations  can  indeed  exclude  such  samples  from  the  support. 

3.  Construction  of  Controlled  PPS  Sampling  Designs.  As  we  pointed  out  in 
Section  2  we  can  utilize  Sampford’s  technique  to  construct  a  PPS(N,n)  sampling 
design  for  every  population  size  N,  sample  size  n  and  any  set  of  admissible 
unit  sizes  While  the  technique  of  Sampford  is  an 

one  it  is  not  applicable  ait  all  if  we  want  to  construct  controlled  PPS(N,n) 
sampling  designs.  In  this  section  we  shall  provide  a  very  general  technique 
for  the  construction  of  such  sampling  designs.  Our  technique  has  no  similarity 
to  the  technique  of  Sampford.  Moreover,  our  technique  enjoys  the  following 
practically  useful  features. (i)  It  is  an  easy  technique  to  be  understood  and 
to  be  utilized  in  practical  situations,  (ii)  It  is  a  very  flexible  technique 
in  a  sense  that  we  can  adjust  it  to  produce  desirable  sampling  designs.  For 
example,  for  given  N,  n  and  admissible  unit  sizes  q^.q^, • • • ,q^  it  is  possible 
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to  adjust  the  technique  to  produce  many  PPS  sampling  designs  with  various 
support  sizes  and  various  probabilities  on  the  samples  in  each  support.  This 
flexibility  allows  us  to  construct  controlled  PPS(N,n)  sampling  designs. 

We  shall  now  establish  a  result  which  is  needed  for  the  development  of 

our  technique  to  follow.  Suppose  we  have  N  nonempty  boxes  containing  k  . 

objects.  For  a  given  integer  n<  N  a  round  of  size  n  is  defined  to  be  a  process 
by  which  we  select  n  boxes  and  remove  one  object  from  each  box.  Now  the 
problem  is  this:  what  are  the  necessary  and  sufficient  conditions  on  N^.k^,...^ 
so  that  all  the  objects  can  be  removed  from  the  N  boxes  by  a  series  of  successive 
rounds  of  size  n?  This  problem  is  completely  solved  in  the  following  lemma. 

Lemma  3.1.  The  necessary  and  sufficient  conditions  for  removing 
k^+k2+- • -+k^  =  M  objects  from  N  boxes  by  a  series  of  successive  rounds  of 
size  n  are:  » 

(1)  M  =  0(mod  n) ; 

(2)  max  k,  s(M/n). 

i 

Proof.  Necessity. (1)  In  each  round  of  size  n  we  remove  n  objects  so  the 
total  number  of  objects,  M,  must  be  a  multiple  of  n.(2)  It  takes  precisely 

M/n  rounds  of  size  n  to  pick  all  the  M  objects.  Therefore,  no  k^  can  exceed 

the  total  number  of  rounds  M/n. 

Sufficiency.  Consider  the  following  procedure.  At  round  one  we  remove  one 
object  from  each  of  the  n  boxes  containing  the  largest  number  of  objects. 
Similarly,  we  proceed  with  the  remaining  (M/n)-l  rounds.  We  claim  that  this 
procedure  will  succeed  in  removing  all  the  M  objects  as  long  as  M  is  a  multiple 
of  n  and  k^  s  M/n,  i=l,2,...,N.  To  simplify  the  proof  we  shall  distinguish 
two  distinct  cases: 

Case  1.  There  are  precisely  n  boxes  each  containing  M/n  objects. 

Case  2.  There  are  less  than  n  boxes  each  containing  M/n  objects. 


Note  that  there  cannot  be  more  than  n  boxes  each  containing  M/n  objects. 
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Also,  N=n  in  Case  1  and  N  >n  in  Case  2.  It  is  clear  that  our  procedure  will 
succeed  in  Case  1.  In  Case  2  we  shall  establish  that  our  procedure  will  end 
up  to  a  case  similar  to  Case  1  for  the  reduced  values  of  M,  and  k-'s  and 
thus  we  will  be  able  to  remove  all  the  M  objects.  To  show  this  we  shall 
prove  two  things:  First,  we  shall  prove  that  at  no  time  the  reduced  values 
of  M  and  k^'s  contradict  the  necessary  conditions  (1)  and  (2)  of  the  lemma. 

After  the  completion  of  round  j  let  and  M^.  be  the  number  of  objects 

left  in  the  ith  box  and  the  total  number  of  objects  left  in  N  boxes 
respectively. 

Now  we  claim  that: 

T 

(1  )  M.  =  0  (mod  n) 

'  J  (i) 

(2  )  max  k  VJ' <  (M./n) 

i  i  3 

(1  )  is  obvious  by  assumption  (1)  and  the  fact  that  M^  =  M- j (n) . 

(2')  can  be  argued  as  follows.  By  assumption  (2)  maxkSM/n. 

(  j 

Thus  at  the  end  of  round  j  of  our  procedure  max  k  <  max  k  -  j  which  yields 

i  i  i  i 

max  k  ^  <  (M/n)  -  j  =  M./n.  As  we  can  see  conditions  (1')  and  (2')  are 
i  i  3 

equivalent  to  conditions  (1)  and  (2)  of  the  lemma  for  integers  n,  M^  and 
k^-^  ,  k^  ^  , . . .  ,  k^^  .  Therefore,  at  the  beginning  of  each  round  the  system 
satisfy  the  necessary  conditions.  Second,  at  the  (j+l)th  round  we  will  be  faced  with 
two  possibilities.  N-n  boxes  will  be  empty  and  each  of  +-he  remaining  n  taxes 
contains  the  same  number  of  objects.  As  we  pointed  out  in  Case  1  above  our 
procedure  will  clearly  succeed.  Otherwise,  we  will  continue  the  rounds.  If 
a  situation  as  above  never  arises  before  round  (M/n)-l  then  round  (M/n)-l 
will  produce  a  situation  as  above  with  one  object  in  n  boxes  and  thus  by 
round  (M/n)  all  the  objects  will  be  removed.  Note  that  the  conditions  of  the 
lemma  will  exclude  the  possibility  of  ending  up  with  more  than  N-n  boxes  to 


be  empty  at  any  round. 
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Remark  3.1.  If  in  addition  to  conditions  (1)  and  (2)  above  N  is  a 
multiple  of  n  then  it  is  possible  that  we  end  up  in  some  round  with  all  N 
boxes  containing  the  same  number  of  objects.  Clearly,  we  can  go  on  with  our 
procedure  in  such  a  situation  in  a  trivial  manner. 


The  following  example  will  elucidate  the  procedure  outlined  in  the 
lemma  and  the  point  mentioned  in  the  above  remark. 


Example  3.1.  Consider 

the  following 

system:  N=6, 

,  n=3, 

kj=6f  k2=9. 

k,“12,  k  =3,  k  =6  and  k  =15 
3  4  5  6 

.  Here  M=51  and  M/n 

=  17. 

The 

necessary 

conditions  are  satisfied. 

The  following 

17  rounds  of 

size 

3  will  remove 

all  the  51  objects. 

Box  No. 

1 

2 

3 

4 

5 

6 

no.  of  objects :k^ 

6 

9 

12 

3 

6 

15 

Round  1 

1 

1 

1 

Residuals 

6 

8 

11 

3 

6 

14 

Round  2 

1 

1 

1 

Res iduals 

6 

7 

10 

3 

6 

13 

Round  3 

1 

1 

1 

Res iduals 

6 

6 

9 

3 

6 

12 

Round  4 

1 

1 

1 

Res iduals 

5 

6 

8 

3 

6 

11 

Round  5 

1 

1 

1 

Residuals 

5 

6 

7 

3 

5 

10 

Round  6 

1 

1 

1 

Residuals 

5 

5 

6 

3 

5 

9 

Round  7 

1 

1 

1 

Residuals 

4 

5 

5 

3 

5 

8 

Round  8 

1 

1 

1 

Res iduals 

4 

4 

5 

3 

4 

7 

Round  9 


1 


1 


1 
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Residuals 

3  4 

4 

3 

4 

6 

Round  10 

1 

1 

1 

Residuals 

3  3 

3 

3 

4 

5 

Round  11 

1 

1 

1 

Residuals 

2  3 

3 

3 

3 

4 

Round  12 

1 

1 

1 

Residuals 

2  2 

3 

2 

•3 

3 

Round  13 

1 

1 

1 

Residuals 

2  2 

2 

2 

2 

2 

Round  14 

1 

1 

1 

Residuals 

1  2 

2 

1 

1 

2 

Round  15 

1 

1 

1 

Residuals 

1  1 

1 

1 

1 

1 

Round  16 

1 

1* 

1 

Residuals 

1  1 

0 

0 

1 

0 

Round  17 

1  1 

1 

Residuals 

0  0 

0 

0 

0 

0 

Remarks  3.2.  We  would  like  to  make  the  following  important 
observations  in  the  context  of  Example  3.1.  (i)  Except  in  rounds  1,2,3, 

6,13,  and  15  we  had  more  than  one  choice  in  selecting  n  *  3  boxes. 

(ii)  we  could  modify  our  procedure  and  empty  boxes  3,5,  and  6  in 
rounds  13,  14  and  15  and  boxes  1,2,  and  4  in  rounds  16  and  17.(iii) 

After  rounds  13  and  15  we  ended  up  with  two  examples  of  the  case  pointed 
out  in  Remark  3.1  Again  note  that  in  a  situation  like  this  we  could  easily 
modify  our  procedure  and  empty  the  boxes  in  several  ways. 

The  above  example  clearly  demonstrates  that  in  general  there  are  many 
options  in  the  formation  of  rounds  and  our  procedure  could  be  easily 
modified  throughout  the  process.  These  properties  are  important  when  we 
apply  our  procedure  in  sampling  from  finite  populations. 
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We  shall  now  apply  Lemma  3.1  and  prove  Che  following  theorems. 

In  Theorem  3.1  we  shall  give  a  technique  for  the  construction  of  PPS(N,n) 
sampling  designs  directly  based  on  the  procedure  of  Lemma  3.1.  In  Theorem 
3.2  we  shall  show  how  we  can  explicitly  construct  PPS(N,n)  sampling  designs 
with  the  added  property  that  II  *  0.  Examples  are  given  to  demonstrate 
the  techniques. 

We  recall  from  Section  2  that  the  unit  sizes  q^.q^.-.-.q^j  in  a  PPS 
sampling  should  satisfy  q^  >  0,  nq^  <  1  and  q^+q^+. . .+qN=  1.  Here  for 
all  practical  purposes  we  shall  assume  that  all  q^'s  are  in  rational  forms 

Theorem  3.1.  For  any  N,  n  <  N,  and  unit  sizes  *  "  ,qN  there 

exists  at  least  one  PPS(N,n)  sampling  design. 

Proof  (By  construction) .  Associate  with  the  ith  unit  the  integer 
k^  *  nq^q.  Now  pretend  that  the  N  units  are  N  boxes  with  the  ith  box 
containing  k^  objects.  The  N  integers  k^.k^* • • . and  the  sample  size  n 
clearly  satisfy  condition  (1)  of  lemma  3.1.  They  also  satisfy  condition  (2) 
since  by  assumption  nq^  <  1  and  thus  k^  <  q  *  M/n  for  M  =  k^+k^+. . .+k^. 

Now  by  M/n  rounds  of  size  n  empty  these  N  boxes  and  keep  a  record  of  all 
rounds  as  we  did  in  Example  3.1.  Now  our  PPS(N,n)  sampling  design  is 
defined  as  follows: 

SjiThe  Support .  The  set  of  n  units  in  each  round 

determines  a  sample  in  Sd-  The  (set  theoretic)  union  of  all  these  samples 

constitutes  the  support.  Note  that  since  there  are  (M/n)  *  q  rounds  in  all, 

thus  the  cardinality  of  S.  J  q, 

a 

P^:  Hig  Probability  On  The  Support.  If  -  { i^.i^, . . . ,  i^ 
is  a  sample  in  then  the  probability  on  this  sample,  *s  tlie 

proportion  of  rounds  which  produced  this  sample.  Thus 

Pd^d*  "  r(sd)/q 

where  r(s.)  is  number  of  rounds  of  size  n  in  which  the  units  i, ,i„,...,i 
a  i  l  n 


were  chosen. 
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Indeed,  d  *  ^d’^d^  SO  ^e;^ne<*  a  PPS(N,n)  sampling  design.  Three 

things  should  be  verified,  (a)  The  union  of  samples  in  should  satisfy 

condition  (1.1).  (b)We  should  show  that  P,  is  a  strictly  positive  probability 

d 

distribution  on  S,.  (c)  H,.  *  nq.,  i  ■  1,  2,...,N. 
a  ai  1 

(a)  is  obvious  since  >  0  and  thus  k^  >  0  meaning  that  there  is  at  least 
one  sample  (one  round)  which  contains  the  ith  unit. 

(b)  clearly  p^(s^)  “  r(s^)/q  is  a  positive  number  less  than  one  and 

Pj(s.)  =  —  V  r(s ,)  *  —  .  (no.  of  rounds  ■  M/n)  *  1 
da  q  /  J  d  q 

vsd 

(c)  for  all  i, 

ndi  *  ^  Pd(sd>  "  q  S  r(Sd)  =  q  ki  =  nV 

s,-»i  s,»i 

d  a 


Example  3.2.  Let  N  =■  6,  n  =  3  and  q^  *  2/17,  q^  *  3/17,  q^  *  4/17, 
q^  *  1/17,  q,.  =  2/17,  q^  »  5/17.  In  this  case  q  *  17  and  thus  *  6, 

”9,  k^  *  12,  k^  *  3,  k,.  *  6  and  k^  =  15.  We  have  already  exhibited  a 
table  of  rounds  in  Example  3.1  for  this  problem.  So,  let  us  exhibit  the 
corresponding  PPS(6,3)  sampling  design.  For  example,  the  6  rounds,  1,2, 3, 4, 9, 
13  determine  the  sample  s^  *  (2,3,6)  with  p^s^)  ■  6/ I7  and  similarly  the 
rest  of  the  samples  and  the  corresponding  probabilities. 


sample 

2  3  6 

13  6 

3  5  6 
2  5  6 
15  6 

2  4  6 

14  5 

3  4  6 
12  5 


probability 

6/17 

3/17 

2/17 

1/17 

1/17 

1/17 

1/17 

1/17 

1/17 


rounds  produced  the  sample 
1,  2,  3,  6,  10,  15 

4,  7,  9 

5,  13 


8 

11 

12 

14 

16 

17 
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Let  us  for  example  compute  II  There  are  4  samples  in  the  support  which 
contain  unit  2.  If  we  add  up  the  probabilities  over  these  4  samples 
we  obtain  ■  6/17+1/17+1/17+1/17  -  9/17  which  is  equal  to  nq£  ■  3(3/17). 

Also  note  that  in  this  design  II  >  0  for  all  i,  j  *  i.  Finally,  our 

design  has  excluded  11  samples  out  of(^)for  sampling  purposes. 

Let  us  now  look  at  the  procedure  outlined  in  Theorem  3.1  from  other 
viewpoints.  The  only  demand  we  formally  imposed  on  the  procedure  in  Theorem 
3.1  was  n±  to  be  proportional  to  q^  as  was  specified  in  the  definition  of 
PPS  sampling  designs.  Otherwise,  we  left  the  procedure  very  flexible  so 
that  we  can  adjust  or  modify  it  to  produce  desirable  PPS  samplings.  If, 
for  example,  we  further  demand  that  II  >  0  for  all  i,j  *  i  then  we  should 
adjust  the  procedure  by  observing  the  following  facts. 

Proposition  3.1.  It  is  necessary  that 

(3.1)  min  k 

^  i  *-  n-1  J 

in  order  that  a  resulting  sampling  design  generated  by  the  procedure  of 
Theorem  3.1  have  further  properties  that  H  >  0  for  all  i,  j  *  i.  ( zi 
denotes  the  smallest  integer  greater  than  or  equal  to  z. 

Proof.  The  unit  i  appears  in  precisely  k^  rounds.  Thus  the  number  of 
samples  in  the  support  which  contain  the  unit  i  is  at  most  k^.  Since 
unit  i  appears  with  n-1  other  units  in  each  sample,  thus  in  order  that 
II >  0  for  all  j  *  i  the  ith  unit  should  appear  in  at  least  { (N-1) /(n-1) s 
samples  in  the  support.  Therefore,  it  is  necessary  that  k^  >  { (N-1) /(n-1) 1 
for  i  *  1,2, ... ,N. 

Now  we  shall  show  that  in  case  k^'s  do  not  satisfy  condition  (3.1) 
we  can  artificially  increase  the  values  of  k^'s  so  that  there  will  be 
enough  samples  in  the  support  to  cover  all  pairs  of  units.  In  some  cases 
it  may  be  necessary  to  manipulate  the  values  of  k^'s  even  though  the  k^s 
satisfy  the  necessary  condition  (3.1).  However,  we  recommend  that  this 


device  should  be  avoided  if  possible  if  we  are  interested  in  supports  with 
not  Coo  many  samples.  Let  us,  for  example,  reconsider  Example  3.1.  In 
that  example  had  we  chosen  boxes  1,2,3  in  round  16  and  consequently 
boxes  4,5,6  in  round  17  the  resulting  sampling  design  would  have  suffered 
from  the  undesirable  property  that  ■  0. 

To  avoid  such  outcomes  we  should  keep  track  of  the  pairs  being  covered  as 

we  go  along  and  forming  the  rounds.  We  should  take  advantage  of  those  situations 

in  which  we  have  several  possibilities  for  the  formation  of  rounds.  In 

such  a  situation  we  should  select  a  round  which  help  in  covering  uncovered  pairs 

by  the  proceeding  rounds  as  we  did  in  Example  3.1.  As  we  mentioned  above 

in  any  case  we  can  artificially  increase  the  values  of  k^'s  to  make  sure  that 

enough  samples  are  in  the  support  to  cover  all  the  pairs.  We  shall  now 

explicitly  Indicate  how  to  increase  the  values  of  k^'s  without  putting  too 

many  samples  in  the  support.  The  procedure  is  applicable  whether  or  not 

the  k^'s  satisfy  the  necessary  condition(3 . 1) .  However,  we  shall  explain 

it  in  the  context  of  the  case  in  which  condition  (3.1)  is  violated.  Let 

min  k^  ■  k*  and  assume  that  k*  <  { (N— 1) / (n— 1) ) .  Proceed  as  in  procedure  of 

Theorem  3.1  till  the  stage  in  which,  k^'s,  the  reduced  values  of  k^'s  are 

A 

very  close  to  k  .(We  do  not  need  to  be  too  formal  and  introduce  a  measure 
of  closeness  since, as  we  shall  see, the  operation  we  shall  apply  can  be 
Introduced  at  any  round,  even  in  round  one.)  Multiply  all  k^'s  by  a 
sufficiently  large  integer  h  so  that 

h(min  kt)  >  { (N-l)/(n-l)> 

and  go  on  with  the  remaining  rounds  with  these  artificially  large  remaining 

A 

k^  ■  h  k^,  i  ■  1,2,..., N.  It  is  clear  that  if  we  select  h  large  enough  we 
can  cover  all  the  pairs  (i,j).  The  reason  we  do  not  recommend  this  operation 
in  round  one, or  early  after  that,  is  to  avoid  the  prolongation  of  the  procedure 
and  consequently  having  too  many  unwanted  samples  in  the  support.  There  is 
a  slight  modification  in  forming  the  corresponding  PPS  sampling.  The 

o 
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suppore,  as  in  Theorem  3.1,  consists  of  chose  samples  formed  by  the  rounds. 

And  for  probabilities  if  s^  is  a  sample  then  the  probability  over  it  is  computed 

by  ^ 

Pd<sd)  “  r*(8d)/q*-  q*  “2r*(sd) 

Sd 

* 

where,  r  (sd>  -  h(r]/sd))  +  r2(sd)  with 

r.  (s ,)  ■  no.  of  rounds  which  produced  s.  before  the  application  of  h; 

1  u  a 

and 

r2(sd>  -  no.  of  rounds  which  produced  sd  after  the  application  of  h. 

Now  we  give  an  example  to  explain  the  above  ideas. 

Example  3.3.  Let  N  *  8,  n  *  3  and  the  unit  sizes  q^'s  as  given  below: 


unit 

1 

2 

3 

4 

5 

6 

7 

8 

qi 

3qi 

ki 

unit 

2/18  3/18 

6/18  9/18 

6  9 

Here 

1  2 

1/18  5/18 

3/18  15/18 

3  15 

m^Ln  k^  *>  3 

3  4 

1/18  2/18 

3/18  6/18 

3  6 

<  <£;  -  4. 

5  6 

1/18  3/18 

3/18  9/18 

3  9 

7  8 

ki 

6 

9 

3 

15 

3 

6 

3 

9 

Rounds  1,2,3 

3 

3 

3 

Residuals 

6 

6 

3 

12 

3 

6 

3 

6 

Round  4 

1 

1 

1 

Res idual3 

5 

5 

3 

11 

3 

6 

3 

6 

Round  5 

1 

1 

1 

Residuals 

5 

5 

3 

10 

3 

5 

3 

5 

Round  6 

1 

1 

1 

Residuals 

5 

4 

3 

9 

3 

4 

3 

5 

Round  7 

1 

1 

1 

Residuals 

4 

4 

3 

8 

3 

4 

3 

4 

Round  8 

1 

1 

1 

Residuals 

4 

3 

3 

7 

3 

4 

3 

3 

Round  9 

1 

1 

1 

Residuals 

3 

3 

3 

6 

3 

3 

3 

3 

Introduce  h  *  2 

6 

6 

6 

12 

6 

6 

6 

6 

Residuals 


2  2 


Round 


1  1 
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Residuals 

1 

1 

1 

1  1 

2 

1 

1 

Round  25 

1 

1 

1 

Residuals 

0 

0 

1 

1  1 

i 

1 

1 

Round  26 

1 

1  1 

Residuals 

0 

0 

0 

0  0 

i 

1 

1 

Round  27 

i 

1 

1 

Residuals 

0 

0 

0 

0  0 

0 

0 

0 

Note  that  we  increased  the  values  of 

'  the  residuals  k 

^'s  at 

the 

end  of 

round  9  in  which  these 

values 

were  close 

to 

min  It.  •  3  to 
i  i 

begin  with 

.  Since 

in  this  case  h  *  2  each 

round 

before  round 

10  is  counted 

twice 

in  computing 

the  probabilities.  The  resulting  PPS(8,3)  with  all  II  >  0  is  given  below. 


sample 

probability 

sample 

probability 

sample 

probability 

2  4  8 

9/36 

4  7  8 

1/36 

3  6  8 

1/36 

12  4 

4/36 

4  6  7 

1/36 

13  7 

1/36 

4  6  8 

3/36 

13  4 

1/36 

2  3  5 

1/36 

2  4  6 

2/36 

4  5  6 

1/36 

5  7  8 

1/36 

14  8 

2/36 

14  5 

1/36 

12  6 

1/36 

14  6 

2/36 

2  4  7 

1/36 

6  7  8 

1/36 

345  2/36 

This  PPS  sampling  has  excluded(®)~  19  *  37  samples  from  the  support. 

We  now  summarize  in  Theorem  3.2  what  we  have  discovered  above. 
Theorem  3.2.  For  any  N,n  <  N  and  unit  sizes  » q£ • • • • »qN  there  are 
various  probability  proportional  to  sampling  designs  with  various  support 


sizes  and  varieties  of  probabilities  on  each  support  with 


0,  for 


all  i,  J  *  i.  These  sampling  designs  could  be  used  for  the  purpose  of 


controlled  sampling. 


In  conclusion,  we  would  like  to  point  out  that  as  long  as  k^s,  or 
k^^'s,  satisfy  the  conditions  of  Lemma  5,1  our  procedure  will  succeed. 

3o  we  do  not  have  to  choose  the  largest  n  boxes  at  every  round.  This  fact 
together  with  the  technique  of  introducing  a  multiplier  makes  our  procedure 


:  'm _ - 


even  more  flexible.  The  following  example  should  demonstrate  the  point. 

Sx ample  3.4.  As  in  Example  3«2,  let  N  =  6,  n  =  3  and  the  unit  sizes  q.-'s  as  follows 
un 


*i 

2/17 

3/17  4/17 

1/17 

2/17  5/17 

lqi 

6/17 

9/17  12/17 

3/17 

6/17  15/17 

k. 

6 

9  12 

3 

6  15 

Introduce  h  = 


Round  1 


Residuals 
Hound  2 


Residuals 
R 


Residuals 
Round  4 


Residuals 
Rounds  23.24.25 


Residuals 
Rounds  26,2 


Residuals 
Rounds  29,2 


Residuals 
Rounds  30,31 


2  3 

4  6 


Residuals 

10 

15 

21 

Rounds  5.6.7.8,9,10.11 

7 

7 

Residuals 

10 

8 

14 

Rounds  12.13.14,15.16 

5 

5 

2  3 


12  3 
1 
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Residuals 

0 

1 

1 

2 

2 

3 

Round  32 

1 

1 

1 

Residuals 

0 

1 

1 

1 

1 

2 

iound  33 

1 

1 

1 

Residuals 

0 

0 

1 

1 

0 

1 

Round  34 

1 

1 

1 

Residuals 

0 

0 

0 

0 

0 

0 

The  PPS(6,3)  sampling  design  produced  by  this  modified  procedure  is 


sample 

probability 

sample 

probability 

1  2  3 

1/34 

3  5  6 

6/34 

2  3  4 

1/34 

1  2  6 

3/54 

2  3  5 

1/34 

2  4  6 

2/34 

1  4  5 

1/34 

2  5  6 

3/34 

2  3  6 

7/34 

4  5  6 

1/34 

1  3  6 

7/34 

3  4  6 

1/34 

which  is  the  sampling  design  in  Example 

2.2. 
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